Pulaski County
A Multivariate Bernoulli-Based Sampling Method for Multi-Label Data with Application to Meta-Research
Chung, Simon, Vorland, Colby J., Maney, Donna L., Brown, Andrew W.
Datasets may contain observations with multiple labels. If the labels are not mutually exclusive, and if the labels vary greatly in frequency, obtaining a sample that includes sufficient observations with scarcer labels to make inferences about those labels, and which deviates from the population frequencies in a known manner, creates challenges. In this paper, we consider a multivariate Bernoulli distribution as our underlying distribution of a multi-label problem. We present a novel sampling algorithm that takes label dependencies into account. It uses observed label frequencies to estimate multivariate Bernoulli distribution parameters and calculate weights for each label combination. This approach ensures the weighted sampling acquires target distribution characteristics while accounting for label dependencies. We applied this approach to a sample of research articles from Web of Science labeled with 64 biomedical topic categories. We aimed to preserve category frequency order, reduce frequency differences between most and least common categories, and account for category dependencies. This approach produced a more balanced sub-sample, enhancing the representation of minority categories.
- North America > United States > Arkansas > Pulaski County > Little Rock (0.04)
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- North America > United States > Indiana > Monroe County > Bloomington (0.04)
- (4 more...)
Enhancing Clinical Note Generation with ICD-10, Clinical Ontology Knowledge Graphs, and Chain-of-Thought Prompting Using GPT-4
Makohon, Ivan, Najafi, Mohamad, Wu, Jian, Brochhausen, Mathias, Li, Yaohang
In the past decade a surge in the amount of electronic health record (EHR) data in the United States, attributed to a favorable policy environment created by the Health Information Technology for Economic and Clinical Health (HITECH) Act of 2009 and the 21st Century Cures Act of 2016. Clinical notes for patients' assessments, diagnoses, and treatments are captured in these EHRs in free-form text by physicians, who spend a considerable amount of time entering and editing them. Manually writing clinical notes takes a considerable amount of a doctor's valuable time, increasing the patient's waiting time and possibly delaying diagnoses. Large language models (LLMs) possess the ability to generate news articles that closely resemble human-written ones. We investigate the usage of Chain-of-Thought (CoT) prompt engineering to improve the LLM's response in clinical note generation. In our prompts, we use as input International Classification of Diseases (ICD) codes and basic patient information. We investigate a strategy that combines the traditional CoT with semantic search results to improve the quality of generated clinical notes. Additionally, we infuse a knowledge graph (KG) built from clinical ontology to further enrich the domain-specific knowledge of generated clinical notes. We test our prompting technique on six clinical cases from the CodiEsp test dataset using GPT-4 and our results show that it outperformed the clinical notes generated by standard one-shot prompts.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > United States > Virginia > Norfolk City County > Norfolk (0.04)
- North America > United States > Arkansas > Pulaski County > Little Rock (0.04)
- (4 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Policy-Aware Generative AI for Safe, Auditable Data Access Governance
Mandalawi, Shames Al, Mohammed, Muzakkiruddin Ahmed, Maclean, Hendrika, Cakmak, Mert Can, Talburt, John R.
Enterprises need access decisions that satisfy least privilege, comply with regulations, and remain auditable. We present a policy aware controller that uses a large language model (LLM) to interpret natural language requests against written policies and metadata, not raw data. The system, implemented with Google Gemini~2.0 Flash, executes a six-stage reasoning framework (context interpretation, user validation, data classification, business purpose test, compliance mapping, and risk synthesis) with early hard policy gates and deny by default. It returns APPROVE, DENY, CONDITIONAL together with cited controls and a machine readable rationale. We evaluate on fourteen canonical cases across seven scenario families using a privacy preserving benchmark. Results show Exact Decision Match improving from 10/14 to 13/14 (92.9\%) after applying policy gates, DENY recall rising to 1.00, False Approval Rate on must-deny families dropping to 0, and Functional Appropriateness and Compliance Adherence at 14/14. Expert ratings of rationale quality are high, and median latency is under one minute. These findings indicate that policy constrained LLM reasoning, combined with explicit gates and audit trails, can translate human readable policies into safe, compliant, and traceable machine decisions.
- Asia > Singapore (0.04)
- North America > United States > Arkansas > Pulaski County > Little Rock (0.04)
- Research Report > New Finding (0.49)
- Research Report > Experimental Study (0.47)
Traveling Salesman-Based Token Ordering Improves Stability in Homomorphically Encrypted Language Models
Rho, Donghwan, Seo, Sieun, Sung, Hyewon, Min, Chohong, Ryu, Ernest K.
As users increasingly interact with large language models (LLMs) using private information, secure and encrypted communication becomes essential. Homomorphic encryption (HE) provides a principled solution by enabling computation directly on encrypted data. Although prior work has explored aspects of running LLMs under HE, the challenge of text generation, particularly next-token prediction, has received limited attention and remains a key obstacle to practical encrypted interaction. In this work, we propose a TSP-based token reordering strategy to address the difficulties of encrypted text generation, together with a post-processing step that further reduces approximation error. Theoretical analysis and experimental results demonstrate that our method prevents collapse, improves coherence in generated text, and preserves data privacy throughout. Overall, our contributions advance the feasibility of practical and privacy-preserving LLM inference.
- North America > United States > California > Los Angeles County > Los Angeles (0.28)
- Asia > South Korea > Seoul > Seoul (0.04)
- Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.04)
- (4 more...)
- Information Technology > Security & Privacy (1.00)
- Education > Educational Setting > Higher Education (0.94)
- Education > Curriculum > Subject-Specific Education (0.69)
Adaptive Cybersecurity Architecture for Digital Product Ecosystems Using Agentic AI
Olayinka, Oluwakemi T., Jeswani, Sumeet, Iloh, Divine
Traditional static cybersecurity models often struggle with scalability, real-time detection, and contextual responsiveness in the current digital product ecosystems which include cloud services, application programming interfaces (APIs), mobile platforms, and edge devices. This study introduces autonomous goal driven agents capable of dynamic learning and context-aware decision making as part of an adaptive cybersecurity architecture driven by agentic artificial intelligence (AI). To facilitate autonomous threat mitigation, proactive policy enforcement, and real-time anomaly detection, this framework integrates agentic AI across the key ecosystem layers. Behavioral baselining, decentralized risk scoring, and federated threat intelligence sharing are important features. The capacity of the system to identify zero-day attacks and dynamically modify access policies was demonstrated through native cloud simulations. The evaluation results show increased adaptability, decreased response latency, and improved detection accuracy. The architecture provides an intelligent and scalable blueprint for safeguarding complex digital infrastructure and is compatible with zero-trust models, thereby supporting the adherence to international cybersecurity regulations.
- North America > United States > Colorado > Denver County > Denver (0.28)
- North America > United States > Maryland > Prince George's County > College Park (0.14)
- North America > United States > Colorado > Boulder County > Boulder (0.14)
- (7 more...)
- Information Technology > Security & Privacy (1.00)
- Government > Military > Cyberwarfare (1.00)
Training the next generation of physicians for artificial intelligence-assisted clinical neuroradiology: ASNR MICCAI Brain Tumor Segmentation (BraTS) 2025 Lighthouse Challenge education platform
Amiruddin, Raisa, Yordanov, Nikolay Y., Maleki, Nazanin, Fehringer, Pascal, Gkampenis, Athanasios, Janas, Anastasia, Krantchev, Kiril, Moawad, Ahmed, Umeh, Fabian, Abosabie, Salma, Abosabie, Sara, Alotaibi, Albara, Ghonim, Mohamed, Ghonim, Mohanad, Mhana, Sedra Abou Ali, Page, Nathan, Jakovljevic, Marko, Sharifi, Yasaman, Bhatia, Prisha, Manteghinejad, Amirreza, Guelen, Melisa, Veronesi, Michael, Hill, Virginia, So, Tiffany, Krycia, Mark, Petrovic, Bojan, Memon, Fatima, Cramer, Justin, Schrickel, Elizabeth, Kosovic, Vilma, Vidal, Lorenna, Thompson, Gerard, Ikuta, Ichiro, Albalooshy, Basimah, Nabavizadeh, Ali, Tahon, Nourel Hoda, Shekdar, Karuna, Bhatia, Aashim, Kirsch, Claudia, D'Anna, Gennaro, Lohmann, Philipp, Nour, Amal Saleh, Myronenko, Andriy, Goldman-Yassen, Adam, Reid, Janet R., Aneja, Sanjay, Bakas, Spyridon, Aboian, Mariam
High-quality reference standard image data creation by neuroradiology experts for automated clinical tools can be a powerful tool for neuroradiology & artificial intelligence education. We developed a multimodal educational approach for students and trainees during the MICCAI Brain Tumor Segmentation Lighthouse Challenge 2025, a landmark initiative to develop accurate brain tumor segmentation algorithms. Fifty-six medical students & radiology trainees volunteered to annotate brain tumor MR images for the BraTS challenges of 2023 & 2024, guided by faculty-led didactics on neuropathology MRI. Among the 56 annotators, 14 select volunteers were then paired with neuroradiology faculty for guided one-on-one annotation sessions for BraTS 2025. Lectures on neuroanatomy, pathology & AI, journal clubs & data scientist-led workshops were organized online. Annotators & audience members completed surveys on their perceived knowledge before & after annotations & lectures respectively. Fourteen coordinators, each paired with a neuroradiologist, completed the data annotation process, averaging 1322.9+/-760.7 hours per dataset per pair and 1200 segmentations in total. On a scale of 1-10, annotation coordinators reported significant increase in familiarity with image segmentation software pre- and post-annotation, moving from initial average of 6+/-2.9 to final average of 8.9+/-1.1, and significant increase in familiarity with brain tumor features pre- and post-annotation, moving from initial average of 6.2+/-2.4 to final average of 8.1+/-1.2. We demonstrate an innovative offering for providing neuroradiology & AI education through an image segmentation challenge to enhance understanding of algorithm development, reinforce the concept of data reference standard, and diversify opportunities for AI-driven image analysis among future physicians.
- North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.14)
- North America > United States > South Carolina > Charleston County > Charleston (0.14)
- North America > United States > Missouri > Boone County > Columbia (0.14)
- (29 more...)
- Instructional Material > Course Syllabus & Notes (1.00)
- Research Report (0.83)
- Health & Medicine > Therapeutic Area > Neurology (1.00)
- Health & Medicine > Nuclear Medicine (1.00)
- Health & Medicine > Diagnostic Medicine > Imaging (1.00)
A Generalized Genetic Random Field Method for the Genetic Association Analysis of Sequencing Data
Li, Ming, He, Zihuai, Zhang, Min, Zhan, Xiaowei, Wei, Changshuai, Elston, Robert C, Lu, Qing
With the advance of high-throughput sequencing technologies, it has become feasible to investigate the influence of the entire spectrum of sequencing variations on complex human diseases. Although association studies utilizing the new sequencing technologies hold great promise to unravel novel genetic variants, especially rare genetic variants that contribute to human diseases, the statistical analysis of high-dimensional sequencing data remains a challenge. Advanced analytical methods are in great need to facilitate high-dimensional sequencing data analyses. In this article, we propose a generalized genetic random field (GGRF) method for association analyses of sequencing data. Like other similarity-based methods (e.g., SIMreg and SKAT), the new method has the advantages of avoiding the need to specify thresholds for rare variants and allowing for testing multiple variants acting in different directions and magnitude of effects. The method is built on the generalized estimating equation framework and thus accommodates a variety of disease phenotypes (e.g., quantitative and binary phenotypes). Moreover, it has a nice asymptotic property, and can be applied to small-scale sequencing data without need for small-sample adjustment. Through simulations, we demonstrate that the proposed GGRF attains an improved or comparable power over a commonly used method, SKAT, under various disease scenarios, especially when rare variants play a significant role in disease etiology. We further illustrate GGRF with an application to a real dataset from the Dallas Heart Study. By using GGRF, we were able to detect the association of two candidate genes, ANGPTL3 and ANGPTL4, with serum triglyceride.
- North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)
- North America > United States > Michigan > Ingham County > Lansing (0.14)
- North America > United States > Michigan > Ingham County > East Lansing (0.14)
- (2 more...)
- Research Report > New Finding (0.96)
- Research Report > Experimental Study (0.94)
A Weighted U Statistic for Genetic Association Analyses of Sequencing Data
Wei, Changshuai, Li, Ming, He, Zihuai, Vsevolozhskaya, Olga, Schaid, Daniel J., Lu, Qing
With advancements in next generation sequencing technology, a massive amount of sequencing data are generated, offering a great opportunity to comprehensively investigate the role of rare variants in the genetic etiology of complex diseases. Nevertheless, this poses a great challenge for the statistical analysis of high-dimensional sequencing data. The association analyses based on traditional statistical methods suffer substantial power loss because of the low frequency of genetic variants and the extremely high dimensionality of the data. We developed a weighted U statistic, referred to as WU-seq, for the high-dimensional association analysis of sequencing data. Based on a non-parametric U statistic, WU-SEQ makes no assumption of the underlying disease model and phenotype distribution, and can be applied to a variety of phenotypes. Through simulation studies and an empirical study, we showed that WU-SEQ outperformed a commonly used SKAT method when the underlying assumptions were violated (e.g., the phenotype followed a heavy-tailed distribution). Even when the assumptions were satisfied, WU-SEQ still attained comparable performance to SKAT. Finally, we applied WU-seq to sequencing data from the Dallas Heart Study (DHS), and detected an association between ANGPTL 4 and very low density lipoprotein cholesterol.
- North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)
- North America > United States > Michigan > Ingham County > Lansing (0.14)
- North America > United States > Michigan > Ingham County > East Lansing (0.14)
- (2 more...)
Turning Up the Heat: Assessing 2-m Temperature Forecast Errors in AI Weather Prediction Models During Heat Waves
Ennis, Kelsey E., Barnes, Elizabeth A., Arcodia, Marybeth C., Fernandez, Martin A., Maloney, Eric D.
Extreme heat is the deadliest weather-related hazard in the United States. Furthermore, it is increasing in intensity, frequency, and duration, making skillful forecasts vital to protecting life and property. Traditional numerical weather prediction (NWP) models struggle with extreme heat for medium-range and subseasonal-to-seasonal (S2S) timescales. Meanwhile, artificial intelligence-based weather prediction (AIWP) models are progressing rapidly. However, it is largely unknown how well AIWP models forecast extremes, especially for medium-range and S2S timescales. This study investigates 2-m temperature forecasts for 60 heat waves across the four boreal seasons and over four CONUS regions at lead times up to 20 days, using two AIWP models (Google GraphCast and Pangu-Weather) and one traditional NWP model (NOAA United Forecast System Global Ensemble Forecast System (UFS GEFS)). First, case study analyses show that both AIWP models and the UFS GEFS exhibit consistent cold biases on regional scales in the 5-10 days of lead time before heat wave onset. GraphCast is the more skillful AIWP model, outperforming UFS GEFS and Pangu-Weather in most locations. Next, the two AIWP models are isolated and analyzed across all heat waves and seasons, with events split among the model's testing (2018-2023) and training (1979-2017) periods. There are cold biases before and during the heat waves in both models and all seasons, except Pangu-Weather in winter, which exhibits a mean warm bias before heat wave onset. Overall, results offer encouragement that AIWP models may be useful for medium-range and S2S predictability of extreme heat.
- North America > United States > California (0.14)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.05)
- North America > United States > North Carolina > Buncombe County > Asheville (0.04)
- (10 more...)
- Energy (1.00)
- Government > Regional Government > North America Government > United States Government (0.67)
Geological Inference from Textual Data using Word Embeddings
Linphrachaya, Nanmanas, Gómez-Méndez, Irving, Siripatana, Adil
This research explores the use of Natural Language Processing (NLP) techniques to locate geological resources, with a specific focus on industrial minerals. By using word embeddings trained with the GloVe model, we extract semantic relationships between target keywords and a corpus of geological texts. The text is filtered to retain only words with geographical significance, such as city names, which are then ranked by their cosine similarity to the target keyword. Dimensional reduction techniques, including Principal Component Analysis (PCA), Autoencoder, Variational Autoencoder (VAE), and VAE with Long Short-Term Memory (VAE-LSTM), are applied to enhance feature extraction and improve the accuracy of semantic relations. For benchmarking, we calculate the proximity between the ten cities most semantically related to the target keyword and identified mine locations using the haversine equation. The results demonstrate that combining NLP with dimensional reduction techniques provides meaningful insights into the spatial distribution of natural resources. Although the result shows to be in the same region as the supposed location, the accuracy has room for improvement.
- Europe > United Kingdom (0.05)
- Asia > Indonesia > Java > Jakarta > Jakarta (0.05)
- North America > Canada > British Columbia (0.04)
- (32 more...)
- Energy (0.94)
- Materials > Metals & Mining > Lithium (0.50)